beam size
Bidirectional Representations Augmented Autoregressive Biological Sequence Generation
Zhang, Xiang, Wei, Jiaqi, Qiu, Zijie, Xu, Sheng, Jin, Zhi, Gao, ZhiQiang, Dong, Nanqing, Sun, Siqi
Autoregressive (AR) models, common in sequence generation, are limited in many biological tasks such as de novo peptide sequencing and protein modeling by their unidirectional nature, failing to capture crucial global bidirectional token dependencies. Non-Autoregressive (NAR) models offer holistic, bidirectional representations but face challenges with generative coherence and scalability. To transcend this, we propose a hybrid framework enhancing AR generation by dynamically integrating rich contextual information from non-autoregressive mechanisms. Our approach couples a shared input encoder with two decoders: a non-autoregressive one learning latent bidirectional biological features, and an AR decoder synthesizing the biological sequence by leveraging these bidirectional features. A novel cross-decoder attention module enables the AR decoder to iteratively query and integrate these bidirectional features, enriching its predictions. This synergy is cultivated via a tailored training strategy with importance annealing for balanced objectives and cross-decoder gradient blocking for stable, focused learning. Evaluations on a demanding nine-species benchmark of de novo peptide sequencing show that our model substantially surpasses AR and NAR baselines. It uniquely harmonizes AR stability with NAR contextual awareness, delivering robust, superior performance on diverse downstream data. This research advances biological sequence modeling techniques and contributes a novel architectural paradigm for augmenting AR models with enhanced bidirectional understanding for complex sequence generation. Code is available at https://github.com/BEAM-Labs/denovo.
- Asia > China > Shanghai > Shanghai (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > British Columbia (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)
A Appendix
's are feedforward networks with three residual blocks, f We often place restrictions on the derivations to operationalize domain-specific constraints. 's are assigned to 0. In practice these constraints are implemented A.2 Lower Bound Derivation log p Due to memory constraints, in practice we use a batch size of 1 and simulate larger batch sizes through gradient accumulation. We observed training to be somewhat unstable and some datasets (e.g. For SCAN, all models and embeddings are 256-dimensional. We tune over the number of layers, hidden units, and dropout rate.
Appendix for Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen
Finally, we describe the training setup for our back-translation experiments. We continue to differentiate our method from other existing works. Our method does not train multiple peer models with EM training either. In each round, a forward (or backward) model takes turn to play the "back-translation" role to train The role is switched in the next round. In other words, source and target are identical.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Canada (0.04)
- Europe > Germany > Berlin (0.04)
- (4 more...)
Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
Kaliosis, Panagiotis, Pavlopoulos, John
Handwritten text recognition aims to convert visual input into machine-readable text, and it remains challenging due to the evolving and context-dependent nature of handwriting. Character sets change over time, and character frequency distributions shift across historical periods or regions, often causing models trained on broad, heterogeneous corpora to underperform on specific subsets. To tackle this, we propose a novel loss function that incorporates the Wasserstein distance between the character frequency distribution of the predicted text and a target distribution empirically derived from training data. By penalizing divergence from expected distributions, our approach enhances both accuracy and robustness under temporal and contextual intra-dataset shifts. Furthermore, we demonstrate that character distribution alignment can also improve existing models at inference time without requiring retraining by integrating it as a scoring function in a guided decoding scheme. Experimental results across multiple datasets and architectures confirm the effectiveness of our method in boosting generalization and performance. We open source our code at https://github.com/pkaliosis/fada.
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > Dominican Republic (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (11 more...)
A Appendix
's are feedforward networks with three residual blocks, f We often place restrictions on the derivations to operationalize domain-specific constraints. 's are assigned to 0. In practice these constraints are implemented A.2 Lower Bound Derivation log p Due to memory constraints, in practice we use a batch size of 1 and simulate larger batch sizes through gradient accumulation. We observed training to be somewhat unstable and some datasets (e.g. For SCAN, all models and embeddings are 256-dimensional. We tune over the number of layers, hidden units, and dropout rate.
FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
Grigoryan, Lilit, Bataev, Vladimir, Karpov, Nikolay, Andrusenko, Andrei, Lavrukhin, Vitaly, Ginsburg, Boris
--While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. T o fully leverage modern hardware capabilities, we present a novel open-source Flex-CTC toolkit for fully GPU-based beam decoding, designed for Connectionist T emporal Classification (CTC) models. Developed entirely in Python and PyT orch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST -based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and production use. Advancements in GPU hardware and deep learning architectures have facilitated the full parallelization of many components in automatic speech recognition (ASR) systems on GPUs. Modern ASR encoder architectures - such as transformers [1], [2] and Conformers [3] - are explicitly engineered to leverage this parallelism by enabling simultaneous computation across audio sequences, thereby maximizing GPU utilization and throughput.
- North America > United States (0.04)
- Asia > Armenia (0.04)